How to initialize a llmp job
When dealing with large language models in programming projects it is common challenge to get reliable outputs from the model. By carefully crafting prompts for different tasks, generating and selecting few shots examples and tuning the temperature and top_k parameters, it is possible to get good results. However, this process is time-consuming and requires a lot of manual work. While developing a project it is a hurdle that forces you to leave the flowstate and contradicts the idea of fast iteration.
While using LLM for text generation and creative tasks may be sufficient with long/short form text outputs, for programming tasks the output requires a more structured type reliable format. To integrate LLM generation tasks in a programming project without leaving the development flowstate, we want to reduce the time spent on prompt engineering as much as possible. A LLMP unit of work (Job) is therefore reduced to the minimal generative effort needed to define the task. Simply by defining the input and the output model of the task. The initialization event will then handle further example generation, instruction generation and run an optimization process to craft a reliable prompt from it. Each Job is stored under the default or custom job directory and can be reused within your project by referencing the job id or (optional) job name.
To initalize a Job we have different possibilites that we want to present in the following Notebooks:
- Initialize a Job by defining the input and output model
- Initialize a Job
Define Input and Output Model using Pydantic
For our example we will define a Job for simple labeling task. Where we want to define the genre of a book. To define the possible Labels we will use a Enum class from the standard library. To define the input and output model we will use Pydantic.
from typing import Literal
from pydantic import BaseModel
from llmp.services.program import Program
class InputObject(BaseModel):
book_title: str
book_author: str
release_year: int
class OutputObject(BaseModel):
genre: Literal["fiction", "non-fiction", "fantasy", "sci-fi", "romance", "thriller", "horror", "other"]
# Initialize a job
program = Program("Book to Genre", input_model=InputObject, output_model=OutputObject)
# load a job
# program = Program("Book to Genre")
input_data={
"book_title": "The Lord of the Rings",
"book_author": "J. R. R. Tolkien",
"release_year": 1954
}
program(input_data=input_data)
---------------------------------------------------------------------------
MaxRetriesError Traceback (most recent call last)
Cell In[4], line 6
1 input_data={
2 "book_title": "The Lord of the Rings",
3 "book_author": "J. R. R. Tolkien",
4 "release_year": 1954
5 }
----> 6 program(input_data=input_data)
File ~\Codes\LLMP\src\llmp\services\program.py:110, in Program.__call__(self, input_data, auto_optimize, log_action, return_metrics, **kwargs)
107 is_first_run = True
108 generator_type = "consensus"
--> 110 output, run_metrics = self.job_manager.generate_output(
111 self.job, input_data, generator_type=generator_type, return_metrics=return_metrics, **kwargs
112 )
114 if is_first_run:
115 self.job_manager.optimize_job(self.job, mode="all")
File ~\Codes\LLMP\src\llmp\services\job_manager.py:81, in JobManager.generate_output(self, job, input_data, generator_type, **kwargs)
79 """Generate output for a specific input."""
80 generator = load_generator_cls(generator_type=generator_type)(job, **kwargs)
---> 81 result, run_metrics = generator.generate(input_data, **kwargs)
82 event_metric = {
83 "verification_type": generator.verification_type,
84 **run_metrics,
85 **kwargs
86 }
87 job.log_generation(input_data, result, event_metric)
File ~\Codes\LLMP\src\llmp\components\generator\simple.py:35, in Generator.generate(self, input_data, **kwargs)
25 """Generate an output based on the job and input data.
26
27 loads the engine from the job and runs it with the input data.
(...)
31 **kwargs: any - passed to engine.run() method
32 """
34 engine = load_engine_from_job(self.job, self._job_settings, **self._engine_kwargs)
---> 35 output, run_metrics = engine.run(input_data, **kwargs)
36 return output, run_metrics
File ~\AppData\Local\pypoetry\Cache\virtualenvs\llmp-l_UTfyBq-py3.11\Lib\site-packages\structgenie\engine\genie.py:61, in StructEngine.run(self, inputs, **kwargs)
59 e = MaxRetriesError(f"exceeded max retries: {self.max_retries}")
60 self._log_error(e)
---> 61 raise e
MaxRetriesError: MaxRetriesError(exceeded max retries: 4)
Despite defining an Enum class we can also set options via Field or use the Literal type. The following example shows how to define the same OutputModel using Field and Literal.
Let's define a new program
from typing import Optional
class InputObject(BaseModel):
book_title: str
book_author: str
release_year: int
class OutputObject(BaseModel):
genre: Literal["fiction", "non-fiction", "fantasy", "sci-fi", "romance", "thriller", "horror", "other"]
has_sequal: bool
sequal_name: Optional[str] = "None"
# Initialize a job
program = Program("Book to Genre/Sequal", input_model=InputObject, output_model=OutputObject)
# load a job
#program = Program("Book to Genre/Sequal")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[1], line 4
1 from typing import Optional
----> 4 class InputObject(BaseModel):
5 book_title: str
6 book_author: str
NameError: name 'BaseModel' is not defined
program.job.generation_log
[{'event_id': '6eee28be26ef44b68a7085418e8964e7',
'input': {'book_title': 'The Bible',
'book_author': 'Johannes Gutenberg',
'release_year': 1450},
'output': {'genre': 'non-fiction',
'has_sequal': False,
'sequal_name': 'None'}},
{'event_id': '9a1109c8b8fd46a0a2195a8c2dce0d63',
'input': {'book_title': 'Harry Potter',
'book_author': 'J. K. Rowling',
'release_year': 1997},
'output': {'genre': 'fantasy',
'has_sequal': True,
'sequal_name': 'Harry Potter and the Chamber of Secrets'}}]
input_data={
"book_title": "Harry Potter",
"book_author": "J. K. Rowling",
"release_year": 1997
}
result = program(input_data=input_data)
if result.has_sequal:
print(f"The book {result.sequal_name} is a sequal to {input_data['book_title']}")
else:
print(f"The book {input_data['book_title']} has no sequal")
The book Harry Potter and the Chamber of Secrets is a sequal to Harry Potter